Random Forest Regression

Data Processing

In [29]:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
%matplotlib inline
plt.rcParams['figure.figsize'] = [14, 8]

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

Fitting the Random Forest Regression Model to the dataset

In [26]:

regressor = RandomForestRegressor(n_estimators = 300, random_state = 0)
regressor.fit(X, y)

Out[26]:

RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=300, n_jobs=1,
           oob_score=False, random_state=0, verbose=0, warm_start=False)

Visualising the Random Forest Regression results (for higher resolution and smoother curve)

In [31]:

X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Random Forest Regression Model)')
plt.xlabel('Level')
plt.ylabel('Salary')
plt.show()

Out[31]:

Predicting a new result

In [30]:

y_pred = regressor.predict(6.5)
y_pred

Out[30]:

array([ 160333.33333333])

Here the predicted value is almost equal to 160K which is the Salary value prposed by the employee. Also the model seems to be much better then the Polynomial Regression Model.

Random Forest Regression

Data Processing

Fitting the Random Forest Regression Model to the dataset

Visualising the Random Forest Regression results (for higher resolution and smoother curve)

Predicting a new result

Product

Resources

Company